Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203

Closed
wants to merge 2 commits into from

Conversation

moloney
Copy link

@moloney moloney commented Mar 17, 2022

Description

This PR improves (instance) UID handling by allowing them to be automatically remapped through a new "REMAP" action. The new UIDs are created in a way that strips all PHI (usually timestamps embedded into the UID) while preserving the study/series hierarchy (defined by the UIDs) in the anonymized output. This allows the anonymized data to be(for example) uploaded to a PACS or other network entity.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

Open questions

I also updated the default recipe to handle UIDs in this way, if you prefer I don't do that (at least for now) let me know.

moloney added 2 commits March 17, 2022 15:36
Maintains study / series relationships since the same input UID
always produces the same output UID, while removing any potential
PHI (output is generated with cryptograhically secure hash
function).
@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

Hey thanks for the PR!

So this seems very hard coded for your specific use case (which is okay) but I want to challenge you to think of this more generally. E.g., what you really are doing is running a custom function given a specific VR value. E.g.,:

field.element.VR == 'UI

So why is this not just a replace with a custom function? The function is handed the field so you'd just do that check within the function and act appropriately. Is there a reason this approach will not work?

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

E.g.:

REPLACE contains:((?<!SOPClass)UID) func:remap_uid
def remap_uid(item, value, field):
    """Remap existing UID in stable and secure manner
    
    Same input UID creates the same output UID, which keeps study / series
    associations correct (or even FrameOfReferenceUID) provided the full
    study / series in anonymized. At same time input UID can't be recovered
    from output so any potential PHI (i.e. dates / times) is eliminated.
    """
    if field.element.VR == 'UI':
        new_val = remap_uid(field.element)

        if isinstance(field.value, list):
            # Handle VM > 1 
            return [generate_uid(entropy_srcs=[x]) for x in field.value]
        else:
            return generate_uid(entropy_srcs=[field.value])

    # default to return the new value
    return new_val

@moloney
Copy link
Author

moloney commented Mar 17, 2022

I agree this could just be a separate function rather than a dedicated "action". The line is a little blurry, JITTER could just be a function too, right? Does the 'deid' package provide a library of useful functions like this that could then be reference in the default recipe? I think this remapping behavior for UIDs is a much better default than stripping them as the latter produces essentially "broken" DICOM files.

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

The difference is that JITTER is an explicit but general action, and one that is generally requested across this process so the dates aren't the same. This REMAP isn't really intuitive, arguably it could describe something else, and it's on the same par as a custom uid function. So I think in my mind, the two are fairly different.

Does the 'deid' package provide a library of useful functions like this that could then be reference in the default recipe? I think this remapping behavior for UIDs is a much better default than stripping them as the latter produces essentially "broken" DICOM files.

But I love this idea! Indeed it would be really useful to provide custom functions, and make it easy to contribute and then use! If you want to talk about a design to support that (and then make it easier for the user to use as opposed to needing to roll their own as you have) I'd definitely be open to this new addition. Perhaps the current deid/dicom/actions.py could be renamed to a module folder, e.g.,:

deid/dicom/actions/
    __init__.py
   jitter.py             <- the current jitter timestamp function

and JITTER would be grandfathered into being it's own term (for the reasons I specified) but technically it's still an action and could go alongside the actions (I'll leave the organization up to you - I optimize to make it easier for the developer to find things). But then perhaps we could have a uids set of actions:

deid/dicom/actions/
    __init__.py
   jitter.py             <- the current jitter timestamp function
   uids.py

Into which we can put the documentation example, along with your function here! And then for usage:

REPLACE contains:((?<!SOPClass)UID) deid_func:remap_uid

and all the custom functions would be imported into init.py and thus accessible via deid.dicom.actions.<name> so you'd parse that it's a deid_func, get the string, and then use importlib to get it. Then we would have a nice table of custom functions defined, and instructions for adding a custom function (basically writing a new module in that file).

Let me know your thoughts! That's just a quick idea for a design.

@moloney
Copy link
Author

moloney commented Mar 18, 2022

I agree, that is a better approach and will make it much easier to add more custom functions in the future. I will close this and open a different PR.

@moloney moloney closed this Mar 18, 2022
@vsoch
Copy link
Member

vsoch commented Mar 18, 2022

Awesome! Looking forward to seeing it - please ping me if you want to have any discussion, etc.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 18, 2022

This is a wonderful idea!

@vsoch
Copy link
Member

vsoch commented Apr 14, 2022

@moloney and @wetzelj please see #208 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants